On k-Median clustering in high dimensions
نویسنده
چکیده
We study approximation algorithms for k-median clustering. We obtain small coresets for k-median clustering in metric spaces as well as in Euclidean spaces. Specifically, in IR, those coresets are of size with only polynomial dependency on d. This leads to a (1 + ε)-approximation algorithm for kmedian clustering in IR, with running time O(ndk + 2 O(1) dn), for any σ > 0. This is an improvement over previous results [7, 20, 21]. We also provide fast constant factor approximation algorithms for kmedian clustering in finite metric spaces. We use those coresets to compute (1 + )approximation k-median clustering in the streaming model of computation, using only O(kd −2 log n) space, where the points are taken from IR. This is the first streaming algorithm, for this problem, that has space complexity with only polynomial dependency on the dimension.
منابع مشابه
A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملLinear Time Algorithms for Clustering Problems in Any Dimensions
We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2 O(1) dn) time (1 + ε)-approximation algorithms fo...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کامل